ORTHOGONAL POLYNOMIALS OF THE R-LINEAR GENERALIZED 
MINIMAL RESIDUAL METHOD 
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Abstract. The speed of convergence of the R-linear GMRES method is bounded in terms of 
a polynomial approximation problem on a finite subset of the spectrum. This result resembles the 
classical GMRES convergence estimate except that the matrix involved is assumed to be condiago- 
nalizable. The bounds obtained are applicable to the CSYM method, in which case they are sharp. 
Then a three term recurrence for generating a family of orthogonal polynomials is shown to exist, 
yielding a natural link with complex symmetric Jacobi matrices. This shows that a mathematical 
framework analogous to the one appearing with the Hermitian Lanczos method exists in the complex 
symmetric case. The probability of being condiagonalizable is estimated with random matrices. 
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1. Introduction. Suggested in [5], there exists an M-linear GMRES (generalized 
minimal residual) method for solving a large real linear system of equations of the 
form 

Kz + M^z = b (1.1) 

for K £ C, € C"^" and 6 £ C". Systems of this type appear regularly in applica- 
tions. This is manifested by the complex symmetric case which corresponds to k = 
and M^'^ — M^. (For its importance in applications, such as the numerical solution 
of the complex Helmholtz equation, see [H].) Then the M-linear GMRES method 
reduces to the CSYM method [5]. We have k 0, e.g., in an approach to solve the 
electrical conductivity problem [5] which requires solving an R-linear Beltrami equa- 
tion [21] . For a wealth of information regarding real linearity, see [22l • Although the 
R-linear GMRES method is a natural scheme, its properties are not well understood. 
Assuming to be condiagonalizable, in this paper a polynomial approximation 
problem on the plane is introduced for assessing its speed of convergence. In the com- 
plex symmetric case a three term recurrence for generating orthogonal polynomials 
arises, leading to a natural link with complex symmetric Jacobi matrices. 

The bounds obtained are intriguing by the fact that they show that the conver- 
gence depends on the spectrum of the real linear operator involved. So far it has 
not been clear what is the significance of the spectrum in general and for iterative 
methods in particular |17[ Here it is shown to play a role similar to what the 
spectrum does in the classical GMRES bounds [29]. A striking difference is that the 
bounds reveal a strong dependence of the speed of convergence on the vector. 

Moreover, with any natural Krylov subspace method there exists a connection 
between the iteration and orthogonal functions. As a rule, these are associated with 
normality. The Hermitian Lanczos method is related with a three term recurrence 
for generating orthogonal polynomials; see |15| and references therein. For unitary 
matrices the corresponding length of recurrence is five [28]; see also [30]. These are 
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special instances of the general framework for normal matricefQ described in [TFl [H] . 
In this paper an analogous connection is established in the complex symmetric case 
to orthogonalize monomials 

1, A, |Ap, A|Ap, |A|4, A|A|^... (1.2) 

with a three term recurrence. This link is not entirely unexpected by the fact that 
antilinear operators involving a complex symmetric matrix have been regarded as 
yielding an analogue of normality [171 P- 250]. As opposed to the Hermitian Lanczos 
method, the structure is richer now as orthogonality based on a three term recurrence 
and respective rapid least squares approximation is possible on very peculiar curves 
in C (and not just on subsets of M which are also admissible); see the assumptions of 
Theorem 14 . 5 1 for the admissible curves. The arising family of functions can be viewed 
to extend radial functions in a natural way. 

Unlike diagonalizability in the complex linear case, condiagonalizability is a more 
intricate structure. Random matrix theory is invoked to assess how likely it is to 
have a condiagonalizable operator in (jl.ip . In this manner we end up touching many 
aspects of the theory that has been linked by the classical Hermitian Lanczos method 
in recent years [6]. 

The paper is organized as follows. In Section [2] bounds on the R-linear GMRES 
convergence are derived in the condiagonalizable case. The probability of a matrix 
being condiagonalizable is assessed in Section|31 Section|3]is concerned with the theory 
of orthogonal polynomials related with the M-linear Arnoldi method. It is shown that 
complex symmetry is naturally treated within antilinear structure. Only then its rich 
properties become visible. In Section [5] some preliminary numerical experiments are 
presented. 

2. Condiagonalizability and the convergence of the M-linear GMRES. 

Condiagonalizability means that the real linear operator appearing on the left-hand 
side of (|l.ll) is diagonalizable. Before deriving the bounds, we first recall how Krylov 
subspaces are generated with an R-linear operator in (jl.ip by executing the R-linear 
Arnoldi method. 

2.1. Krylov subspaces of the R-linear GMRES. When C" is regarded as 
a vector space over C, any real linear operator can be presented as 

2 1 — > Mz={M + M^t)z = Mz + M^z (2.1) 

with matrices M,M^ G C"^". Here r denotes the conjugation operator on C". The 
set of eigenvalues, i.e., the spectrum of a real linear operator Ai = M + M^t is defined 
as 

{a e C I Mz = Xz for some z O} . 

The spectrum is an algebraic set of degree 2n at most. For more details on the real 
linear eigenvalue problem, see [9l [20| . 

In this paper we are interested in having M ~ kI for a scalar k G C. Then the 
real linear operator is denoted by A^^- In this case the spectrum possesses a relatively 
simple structure as follows. 

Proposition 2.1. The spectrum of Ai^. consists of circles centred at k. 



^ The length of recurrence depends on what is the least possible degree for an algebraic curve to 
contain the eigenvalues. 
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The eigenvalues of M-q, i.e., circles centred at the origin, are also called the 
coneigenvalues of the matrix [17] . 

To describe methods to compute Krylov subspaces with M-linear operators, we 
follow [9j Section 3.1]. Executing the iteration with J\4k. starting from a vector b S C", 
we obtain the Krylov subspace 

IC,{M.; b) = span{6, . . . , A^r 

= span{6, Mj), M^W^b, M^M^M^b, . . .} 

which is hence independent of k. For this an orthonormal basis can be computed 
numerically reliably by invoking the real linear Arnoldi method O p. 820]. In partic- 
ular, if dmilCj{A4K',b) = n and Q denotes the respective unitary matrix having the 
orthonormal basis vectors as its columns, then Q*M^Qt is the respective representa- 
tion of M^T in this basis. 

The following simple fact is of importance. 

Proposition 2.2. Let X e C"^" be invertible. Then 

= /C,(A/;;c) 

with iV# = X~'^M#X and c = X'^b. 

If X is unitary, then the corresponding sequences of Krylov subspaces are in- 
distinguishable in the standard Euclidean geometry, i.e., all the corresponding inner 
products computed coincide. 

In the R- linear GMRES method for solving suggested in [9 , at the jth step 
one imposes the minimum residual condition 

min ||XkZ-6|| 

for the approximation to satisfy. With appropriate modifications taking into account 
the real linearity, the iteration can be implemented to proceed like the classical GM- 
RES [29]. In particular, if is either symmetric or skew-symmetric, then the 
iteration can be realized in terms of a three term recurrence. 

It is noteworthy that the M-linear GMRES converges at least as fast as the stan- 
dard GMRES applied to the real system of doubled size obtained by separating the 
real and imaginary parts in This fact is not surprising. For Krylov subspace 

methods, not writing complex problems in a real form has been advocated already in 
[TTl p. 446]. Thereby understanding the convergence of the R- linear GMRES is of 
central relevance. 

As a final remark, to precondition the linear system such that the structure 
is preserved, see [2TJ Section 4.2]. 

2.2. Polynomial approximation problem of the R-linear GMRES con- 
vergence. The following notion is needed in what follows. 

Definition 2.3. A matrix € (j^nxn ^^^^ ^.^ condiagonalizable if there 
exists an invertible matrix X G C"-x" such that 



Af# = XA#X-^ (2.2) 

with a diagonal matrix . 
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The diagonal entries of are also called the coneigenvalues of . (Admittedly, 
there is a minor, although trivial inconsistency here compared with the comment 
following Proposition 12. II ) 

Analytic polynomials are not sufficient to deal with real linear operators. The 
following subclass of (polyanalytic) polynomial^ is of central relevance for the R- 
linear GMRES. 

Definition 2.4. Polynomials of the form 

LiJ 

5^(a2fc + a2fc+iA)|A|''= (2.3) 

fc=0 

with afc G C and, for j even aj+i = 0, are denoted by7-'j{r2). Their union U°?^g'Pj(r2) 
is denoted by V{r2). 

Clearly, Vj{r2) is a vector space over C of dimension j + 1. With the restriction 
A S M we are dealing with standard analytic polynomials. It is, however, more natural 
to contrast Vj(r2) with radial functions. This and the notation used will be explained 
Section [321 

Observe that problems involving the conjugated variable are becoming more com- 
mon in applications. Gravitational lensing is one such instance . 

Like in the standard GMRES polynomial approximation problem, it is critical how 
well a nonzero constant can be approximated with the elements of Vj (r2) . As usual, 
we denote the condition number of a matrix X £ C"'*" by K2{X) — \\X\\ . 

Theorem 2.5. Suppose e £nxn condiagonalizahle as (|2.2p . If D is a 
unitary diagonal matrix satisfying D^^X^^b G M", then 



min \\M.kZ — b\\ < K2(X) min max Kp(A) + Ap(A) — 1 

where denotes the diagonal matrix D^^A^D. 

Proof. Recall that ICj{Mi^;b) = ICj{Aio;b) holds for any k G C. Take any 
z 6 ICj{A4o;b) and set w = D^^X^^z. Denote D^^X^^b by r. Since the vector r is 
real, we obtain 

w = D-'X-^z = D-^X-\Y^ atM'^b) = ^ a^V^iD^^X-^b) ^ ^ a^P^r 

fc=0 fc=0 fc=0 



L^J 

= £ (a2fc + a2fc+l^#)(^#i5#)^ 
fc=o 

for some constants afe £ C with and a2L(j-i)/2j+i = for j odd. This is a polynomial 
in a diagonal, i.e., normal matrix and its adjoint. Hence we have obtained a link 
between polynomials in A and A. Now we have 

-b\\< \ \XD\ \ \ \V^w-r\ \ = ||A:D|| \\kw + D^w - r\\ . 

Then again, since the vector r is real, 

\\kw + D^w — r| I < 

^Polyanalytic polynomials are polynomials in A and A [3]. 
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k=0 k=0 



< max 



Kp{X) + Xp{X) - 1 Udx)-^ \\b\\ 



where p belongs to Vj-i{r2). Since H^i'll ||(X_D)^^|| = K2{X), the claim follows 
from this. □ 

Observe that in D~^A^D the jth diagonal entry of A# has been multiplied by 
e~2'^J , where e*^^ is the jth diagonal entry of D. 

The key here is the fact that the latter minimisation problem is of standard 
type. Being part of classical approximation theory of functions, there is no linear 
algebra involved]! However, unlike the usual GMRES bound 29, Section 3.4], the 
point set cr(-D#) depends strikingly on the vector b. (The choice of D to make X^^b 
real depends on 6.) It is a finite subset of the spectrum consisting of at most n 
points, though. Generically these points are unique. (Generic here means that M^, 
when condiagonalizable, is assumed to have distinct coneigenvalues.) Moreover, for 
an appropriate choice of 6, it can be any subset of the spectrum with the restriction 
that the number coneigenvalues of of the same modulus does not change. 

The bound shows also that the notion of "spectral radius" for a diagonalizable 
antilinear operator is natural. Observe that, by executing the real linear Arnoldi 
method, it is straightforward to estimate the extreme coneigenvalues of a large (and 
possibly sparse) M^. The rationale is analogous to the way the classical Arnoldi 
method yields eigenvalue approximations. 

The convergence behaviour of the CSYM method has been regarded as somewhat 
puzzling as well, partly because of the somewhat unaccesible structure of the appear- 
ing Krylov subspaces. For some comparisons between other iterative methods, see 
[5j[23]. (Lack of understanding the convergence is not just of theoretical interest. It 
can prevent efficient preconditioning.) The following yields a way to look at it. 

Corollary 2.6. For the CSYM method we can choose X to he unitary to have 



min IIA^o^^^llf; niin max Ap(A) — 1 

zeK.j(Mo;b) peVj-i(r2) Xea{D#) 

where = D^^A^J). 

These bounds are clearly sharp jl6j . 

Observe that if cr{D^) is on a line through the origin, then the CSYM method 
reduces to the MINRES (minimal residual) method [26] for Hermitian matrices. In 
this case the convergence can be regarded as well understood. For instance, then the 
convergence can be expected to be faster if the origin is not included in the convex 
hull of the spectrum. The difference can be dramatic as well. 

3. The probability of condiagonalizability. In complex linear matrix anal- 
ysis, a linear operator is diagonalizable with probability one. Therefore the analysis 
of the speed of convergence of iterations based on classical approximation theory of 
functions on the spectrum is generically a viable approach. In a typical case it can be 
expected to yield good estimates. 



■^A term coined by P. Halmos, noncommutative approximation theory means matrix (operator) 
approximation problems in general. 
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Although the set of condiagonahzable matrices includes complex symmetric ma- 
trices, a subspace of C"^" of dimension n{n + l)/2, assuming condiagonalizability 
turns out to be much more restrictive than assuming diagonalizability. Quantitatively 
this can be expressed in terms of the following result on random matrices. 

Theorem 3.1. Let G (^nxn ^^^g entries with real and imaginary parts drawn 
independently from the standard normal distribution. Then the probability that M# is 
condiagonahzable is 2~"("~i)/2. 

One should bear in mind that in practice matrices possess a lot of structure (such 
as complex symmetry). Thereby, regarding the usage of the bounds of Section [2] in 
applications, this is certainly an overly pessimistic result. 

The rest of this section is dedicated to the proof of Theorem 1 3. II The probability 
that a real n-by-n matrix with standard normal entries has only real eigenvalues has 
been shown to equal 2~"("~i)/** ^8^. From Proposition l3.3l below it is easy to see that a 
real matrix is condiagonahzable with the same probability. For the complex matrices 
of Theorem 13. 11 our computation of the probability proceeds similarly to [8]. 

3.1. Contriangularizable matrices. We start by recalling basic facts on ma- 
trices and consimilarity needed in the proof. A standard reference here is [171 Chapter 
4]. 

Definition 3.2. A matrix G (j^nxn ^^^^ contriangularizable if there 

exists an invertible matrix X e c^x" such that 

M# = XR^Y^ (3.1) 

with an upper triangular matrix . 

A matrix is said to be unitarily contriangularizable if M# = UR^U'^ with U 
unitary and R^ upper triangular. 

Proposition 3.3. Suppose Af# g C"^". Then 

1. is contriangularizable if and only if is unitarily contriangularizable 
if and only if all the eigenvalues of M^M^ are real and nonnegative. 

2. if — U R^U^ with U unitary and i?# upper triangular, the absolute values 
of the diagonal entries of R are always the same, modulo ordering. The 
diagonal entries of R^ can be permuted to any order and chosen to be real 
and nonnegative. 

3. if — UR^U'^ with U unitary and R^ upper triangular, where the absolute 
values \rii\, \r22\i • ■ • j l^nnl of the diagonal entries of R^ are distinct, then 

is condiagonahzable. Moreover, the set of such matrices is open in C"^". 

4. The set 

{M# e C"''" I M# = UR#U'^ with \ru\ = \rjj \ for some i ^ j) 

is of measure zero. Hence almost all contriangularizable matrices are condi- 
agonahzable. 

Proof. The item (1) is [T71 Theorem 4.6.3] and the other claims follow readily 
from the results of 17, Section 4.6]. □ 

Proposition 13.31 (|4]) combined with Theorem 13.11 yields the corollary that the 
probability of a matrix being contriangularizable is 2""^"^^)/^. 

We next prove a uniqueness result which holds true for almost all contriangular- 
izable matrices. The following lemma is needed. 

Lemma 3.4. Let i?, 5 e C"X" upper triangular matrices such that \rii\ = \sii\ 
and \rii\ ^ l^jjl for all i ^ j . If U E C"xn ^ unitary matrix such that 

RU^US (3.2) 
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then U is a diagonal matrix. 

Proof. By Proposition 13.31 ([3]), R and S are condiagonalizable and we can find 
upper triangular invertible matrices X,Y G C"^" such that 

R^X^^DX, S = YDY^\ 

where D is the real diagonal matrix such that da = \rii\. Substituting into p.2p we 
find 



DXUY = XUYD. 



Denoting E = XUY, we see that E must be diagonal since da are distinct. Hence 
U — X^^EY~^ is upper triangular and therefore diagonal since U is unitary. □ 

Proposition 3.5. Let e C"^" and suppose M# = UR^U^ = VS#V^, 
where U, V are unitary, R^ , are upper triangular with the same diagonal consisting 
of distinct real and positive entries. Then there exists a diagonal matrix D G R"^" 
with ±1 diagonal entries such that 

U = VD, 

3.3 

R# = DS^D. ^ ' 



Proof. From the assumptions we get R^U*V ~ U*VS^. By Lemma 13.41 the 
matrix D — U*V is diagonal and we see that i?# = DS#D. Since i?# and S# have 
the same nonzero diagonal, the diagonal of D must have ±1 entries. □ 

3.2. Proof of Theorem 13. IL Since the manipulations that follow require heav- 
ily using matrix indices, we denote the matrix of Theorem 13.1 1 bv A. 
The computation of the probability involves evaluating the integral 

p„ = ^-^J^e-i^^(^'^^dAAdA, (3.4) 

where dA — S!i j=i ddij and T) is the set of condiagonalizable matrices that possess n 
positive and distinct coneigenvalues. 

To compute p„ we perform the change of variables A = URU^ ^ where U is 
unitary, R d TZ and 

7?. = {i? G C"^" I i? is upper triangular and < rn < • • • < r„„} . 

To calculate the corresponding Jacobian we use the notation [dB] to denote the nxn- 
matrix of the differential forms dbij . Since only the absolute value of the Jacobian 
is of interest, in the following we will ignore unconsequential sign changes due to the 
anti-commutativity of the wedge product. Also, we shall ignore the imaginary unit in 
the volume form, i.e. for z ^ x + iy we write dz /\ dz ~ 2 dx /\ dy. Then 

[dA] = [dU]RU^ + U[dR]U'^ + C/i?[dC/]^ 
= U{[dR] + U*[dU]R + R[dUfU)U^. 

Denoting 



[dH] = U*[dU] 
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we have [dH] skew-Hermitian and therefore 

[dA] = U[dM]U^, where [dM] = [dR] + [dH]R - R[dH]. 

Hence 

dA = det([/)2"dM 

and 

dA^d'A^ dM A dM. (3.5) 
We now divide the calculation to three cases 

dM A dM = {drriij A drnij) A ^{dma A dfriii) A {drriij A drfTij). (3.6) 

Suppose first that i > j. Then 

dmy = dhijTjj - rudhij + ^ dhiurkj - ^ rikdhtj. (3.7) 



Actually 



/\(dm,j A cfTnij) = /\(r2^. - )d/iy A dh,j. (3.8) 



To see this, first note that 

{dhijVjj - Tiidhij) A [dhijTjj - rudhij) = (rjj - r^i)dhtj A dhij. (3.9) 

That the last two summations in p.7p make no contribution to p.8p , consider ordering 
their terms first by the increasing second index v of dhuv (and dhuv) and then by the 
decreasing first index u. The elimination starts with dhni (and c?/i„i) and proceeds 
in the described order. We repeatedly use the reduction 

f\{dmij A dfrnj) = Wi A (^2 + ^ dhuv) A (r^^ - rlu)dhuv A 

= wi A tJ2 A (r^^ - rlu)dhuv A d/i™, 

where a;i,a;2 are some differential forms and 7 is ± some entry of R. 
We next consider the case i ^ j in p.6p . Now 

dmii = dru + 2riidhii + ^ dhikrui - ^ Vikdhki- (3.10) 

We get 

/y (dmy A (imij) A f\{dmu A dro^) 



= l\{dmij A (M~") A f\(Arudhu A dr^^), 



(3.11) 



since the terms in the last two summations in (|3.10p are eliminated due to 



POLYNOMIALS OF THE M-LINEAR GMRES 



9 



The remaining case is i < j. Now 

dm^j = drij + ^ dhikVkj - ^ rtkdhkj ■ (3-12) 

All terms in the last two summations are now eliminated due to (jS.lip so that we 
finally get 



dM A dM = 4" ru \\{r'^jj - r^) /\ (dh,^ A dh,j A dr„ A dr-)A 

i i<j 

l\{dhii A dru). 



(3.13) 



We then use p.Sp to compute the integral p.4p by integrating over the unitary group 
and the upper triangular matrices R 

= ^;r7^ / e-^ *^(«*«) dM A dM, 

where the factor 2" corresponds to the fact that by Proposition 13 . 5 1 integration over 
U (n) X TZ counts all matrices A precisely 2" times. 

The volume of the unitary group [1, Proposition 4.1.14] is 

/\ [dh,, A dh~) A f\ dhu = Yl ^ [y - 

The integral over the strict upper triangular part of R is 

i<j 

The integral over the diagonal of the matrices R can be computed using Selberg's 
integral [321 Formula 17.6.6] 

/ 5 Jl r,, Jl (r|^- - r% ) drj i • ■ • dr„„ 

Jdiag(7?,) ^ 

1 noo poo 1 



Hence 



2"(47r)" ^ ^ ^ ^ 

4. Complex symmetry, orthogonal polynomials and three term recur- 
rence. The connection between the Hermitian Lanczos method, Hermitian Jacobi, 
i.e., Hermitian tridiagonal matrices and orthogonal polynomials is standard material 
in numerical linear algebra and classical analysis; see, e.g., [131 [TS] and [5^ 131) . 

Condiagonalizability is a special property which implies that a linear algebra 
problem turns into a problem in classical approximation theory. In what follows, an 
analogous connection for antilinear operators involving a complex symmetric matrix 
is described. For complex symmetric matrices, see the classical publications listed in 
[T71 p. 218]. See also [T^] and references therein for complex symmetric operators on 
separable Hilbert spaces. 
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4.1. Construction. For the connection, consider an antilinear operator 



on C" involving a complex symmetric matrix M^. (We could equally well consider 
Mk, but for the simplicity of the presentation, we set k = 0.) Take a unit vector 
£ C". Then executing the real linear Arnoldi method yields us a tridiagonal complex 
symmetric matrix, i.e., a complex symmetric Jacobi matrix because of the following 
fact. 

Proposition 4.1. fPj/ If M^^ — cM^ with c = ±1, then the real linear Arnoldi 
method is realizable with a three term recurrence. 

Because of the way the real linear Arnoldi method proceeds, in the resulting 
tridiagonal complex symmetric matrix there can appear complex entries only on the 
diagonal. In what follows, when c — 1, the real linear Arnoldi method is called the 
complex symmetric Lanczos method. 

As in the proof of Theorem 12.51 choose a unitary matrix U such that 

= U*Mjj is diagonal and r^U*be W" (4.1) 



holds. Then ICj{M^T;b) is unitarily equivalent to ICj{D^T;r) in the sense of Propo- 
sition [221 For the latter Krylov subspace, the conjugations affect only, yielding 
polynomials in and which correspond to elements of Vj{2r) in a natural way. 

We assume that for any triple of the nonzero coneigenvalues of M#, at most two 
of them can share the same modulus, and, if zero is a coneigenvalue, it appears just 
once. This assumption holds generically. Moreover, we assume the starting vector 
6 e C" to be generic in the sense that the eigenvalues of are distinct and all the 
entries of r are strictly positive. 

By Proposition [2T2] (and the comment that follows), the Jacobi matrix computed 
by the complex symmetric Lanczos method with M^t using the starting vector b 
yields the same Jacobi matrix as when executed with D^t using the starting vector 
r. (Of course, the orthonormal bases generated differ according to Proposition! 
Denote the entries of this matrix as 



ai Pi 

Pi Oi2 '■■ 



... Pn~l 
... /3„_i an 



(4.2) 



so that the corresponding antilinear operator is J#t. The complex symmetric Lanczos 
method is devised in such a way that the entries satisfy aj G C and Pj > assuming 
the method does not break down. (When the classical Hermitian Lanczos method is 
executed, the respective entries satisfy aj e M and Pj > 0.) 

For the converse, assume given and the task is construct a diagonal matrix D 
and a real vector v giving after executing the complex symmetric Lanczos method. 
This can be accomplished by computing a unitary matrix V whose first column v is 
real such that J^r = V*DVt with a diagonal matrix D. 

We have lack of uniqueness in the case there appears two coneigenvalues of the 
same modulus. In (|4.1I) this takes place since for any isometric V G C"^^ we have 
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VV'^ = {VR){VR)'^ for all orthogonal matrices R G M^^^. For the sake of complete- 
ness, the following proposition contains the converse. 

Proposition 4.2. Suppose UU'^ = VV"^ for two isometric matrices U,V € 
C"^™. Then V = UR for a unitary matrix i? e M"^™. 

Proof We have U*V = U*VVW = U*UU^V = U^V = TFV. Take R = U*V . 

□ 

For orthogonal polynomials, associate with each point \j £ cr{D^) the weight r|, 
where Tj is the jth entry of the vector r. Denote by (•,•) the standard Euclidean 
inner product on C". Then an inner product on 'Pj{r2) corresponding to the complex 
symmetric Lanczos method is defined as 

{p,q) = {p{M^T)b,q{AUT)b) 



{p{D^T)r,q{D^T)r)=Y,p{\u)q{\k)rl 



k=l 

where we used_p(Af^T)fe = ELoMM^r)''b = C/ELo C^Ei=o(a2fe + 

a2k+iD#){D#D^)'^r and similarly for q{M^T)b. 

Consider the (discrete) monomial functions in (|1.2p . In terms of the Jacobi matrix 
entries in (j4.2p . the three term recurrence for computing the respective orthogonal 
polynomials can be expressed as 

PoW - 1 

^iPi(A) = A po(A) - aipo(A) 

/32P2(A) = Api(A) -a2Pi(A) -/3ipo(A) ^"^ 

l^aPsW = Ap2(A) - a3P2(A) - ;52Pi(A) 

and so on. Note that the assumption of having distinct eigenvalues with any triple 
of them having at most two values of the same modules together with r| > for all 
J and Proposition 14.31 below implies that the Lanczos method does not break down. 

A number A G C is called a zero of p 6 T'{r2) if p{X) — 0. 

Proposition 4.3. Let p e 'Pj{r2) be nonzero. The following claims hold: 

1. If p has two distinct zeroes of the same modulus, then all numbers of that 
modulus are zeroes. 

2. Let ni be the number of moduli for which all numbers of that modulus are 
zeroes and let s be the number of moduli for which exactly one number is a 
zero. Then 2m + s < j. 

Proof. Let u and v be (ordinary) polynomials of degrees at most [ij and ['^^J, 
respectively, such that 

p{x)^u{m+\v{\xn (4.4) 

By the assumption of Item[l] there exist Ai and A2 such that Ai 7^ A2, |Ai| = IA2I and 
p(Ai) — p{X2) = 0. This together with (|4.4p implies u(|Ai|^) — w(|Ai|^) — proving 
the first claim. 

Let Ml, . . . , Mm be the moduli for which all numbers of these moduli are zeroes. 
By factoring, there exist (ordinary) polynomials u and v such that 

ni m 

u{\M') = u{\Xf) HilXf - Mf), v{\Xf) = mxf) HilXf - Mf). 
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Note that deg(w) < [iJ — m and deg(w) < [^^-J-J — 777,. Let A be a zero of p such that 
no other number is a zero of the same modulus. Then M(|Ap) + Ai^dAp) — and 

^\x\')WF)-\MM\M')WF)-o, 

where the left-hand side is a nonzero (ordinary) polynomial in |Ap of degree at most 
J — 2to. Hence s < j — 2m. □ 

Note that p need not have any zeroes at all. 

If <j{D^) C M holds, then the conjugations are vacuous and we have the classical 
symmetric Lanczos method [27]0 And conversely, if a{D^) ^ E holds, then we 
have a natural extension of the symmetric Lanczos method preserving the length 
of recurrence. Thereby, the numerical behaviour in finite precision, i.e., the loss of 
orthogonality among vectors computed can be expected to be similar to the classical 
symmetric Lanczos method. See [27l Chapter 13.3] for the effects of finite precision 
then. 

Certainly, complex symmetric Jacobi matrices can be treated in the C-linear 
setting [4 . (Then one has to deal with formal orthogonal polynomials.) However, we 
do not find it perhaps quite as natural as through the connection with the complex 
symmetric Lanczos method prescribed. 

4.2. Interpolation and least squares approximation. For the interpolation 
with the elements of 'Pj(r2), it is straightforward to construct Vandermonde-type 
matrices from the monomials (|1.2p . (Numerically this is not advisable, though.) To 
understand their invertibility, consider the case of having exactly two interpolation 
points for each appearing modulus. That is, assume there are k different moduli 
ri > r2 > ■ ■ ■ > rk and 2k points in all. Take the Lagrange interpolation basis 
polynomials 

Hence, /;(|A/p) = 1 while /i(|Am|-^) — for m ^ I. Now, for any two distinct interpola- 
tion nodes with modulus ri, take the unique interpolating polynomial P2W — Q +diX. 
Then P2I1 S Vj{r2) with j = 2k — 1. Taking the sum of these yields the required in- 
terpolant. 

Consequently, the notation V{r2) used is explained as follows. With the elements 
of ■pj(r2) we may interpolate at most two points on a circle. Recall that with the radial 
polynomials S^^pafclAI*^ one can interpolate at most one point on a circle. Repeating 
this idea, it is clear how to define 'Pj(rfc) in such a way that we may interpolate at 
most k points on a circle. Hence we have a natural extension of radial functions. 

Since numerically computations involving orthogonal functions is preferable, in- 
terpolation with the elements of Vj (r2) should be performed by executing the complex 
symmetric Lanczos method just described. (Of course, for the classical symmetric 
Lanczos method this is a standard approach already from the late 1950s [101 US]-) 
This is straighforward by choosing with the diagonal entries equaling the inter- 
polation nodes and r any unit vector supported at the nodes. 



l<rn.<k,m=jLl ' ™ 



■^By the (classical) symmetric Lanczos method we mean the three term recurrence for transform- 
ing a real symmetric matrix into tridiagonal form. 
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4.3. Approximation of continuous functions on curves. The interpolation 
scheme just presented suggests on what kind of curves we can expect the approxima- 
tion to be successful. For approximating continuous functions with the elements from 
'P{r2) we now give a generalization of the Weierstrass approximation theorem. We 
start with the following lemma. 

Lemma 4.4. Let 7 C C &e a compact simple open curve such that 7 intersects 
every origin centred circle in at most two points. Then 7 can be extended to a simple 
closed curve 7 C C such that 7 C 7 and 7 intersects every origin centred circle in at 
most two points. 

Proof. Let ri > (and ^2 > 0) be the supremum (infimum) of the values r such 
that every origin centred circle with radius at most (at least) r does not intersect 7 
(if G 7 then define ri =0). Then the circle of radius ri intersects 7 either in one 
point or two points. Similarly for the circle of radius r2. In case of two intersection 
points, it is easy to extend 7 so that without loss of generality we may assume the 
circles of radius ri and r2 each intersect 7 at exactly one point. 

Let pi (and P2) be the supremum (infimum) of the values r such that every origin 
centred circle with radius p, where ri < p < r (r < p < r2), intersects 7 in exactly 
two points (we may assume such values r exist since 7 can be easily extended to 
accommodate this). It follows that circles of radius r such that pi < r < p2 intersect 
7 at exactly one point which we denote z(r). Denote by Wj and Vj {j — 1,2) the 
intersection points of the arc 7 with the circle of radius pj and further choose Wj 
as one of the end points of the arc 7. Note that \wj\ — \vj\ = pj and by defining 
z{pj) — Vj the function r z{r) becomes continuous in [pi, p2\. 

Let 

e = - Umi{\V2 W2 , \W2 - W2 ) 

2 Vl 

and (by continuity) choose R such that pi < R < p2 and 

\z{r) - V2\ < e for all r > R. (4.5) 

We define the extension 7 as follows. Let 

f , , Wl I 
70 = < z{r) — \pi<r<R 
L Vl 

Note that this is a rigid rotation and therefore the simple curve 7 U 70 intersects all 
circles in at most two points. Next, let a — z{R)wi/ vi and 

f f ■ r \ P2 — r , ^ r — R \ . 

71 = i rexp iarg(Q!) +ta,Tg[w2) ] \ R < r < p2 

L V P2- R P2- RJ 

Note that 7 = 7 U 70 U 71 is closed and intersects all circles in at most two points. 
Due to (|4.5|) it is also a simple curve provided the direction of rotation of the spiral 
is properly chosen either clockwise or counter-clockwise, i.e. we choose a real number 
t such that t < arg(Q!), arg(w2) < t + 2it. □ 

Theorem 4.5. Let ^ a C be a compact simple (open or closed) curve such that 
7 intersects every origin centred circle in at most two points. Let / : 7 — > C 6e a 
continuous function and suppose e > 0. Then there exists a polynomial p G P(r2) 
such that 



max 1/(2;) — p{z)\ < e. 



(4.6) 
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Proof. By Lemma 14.41 we may assume 7 is a closed simple curve. Let ri > (and 
f2 > 0) be the supremum (infimum) of the values r such that every origin centred 
circle with radius at most (at least) r does not intersect 7 (if g 7 then define ri =0). 
Then the circle of radius Vj (j — 1,2) intersects 7 at exactly one point which we denote 
by Wj. Furthermore, every circle of radius r such that ri < r < r2 intersects 7 at 
exactly two points which we denote by zi (r) and Z2 (r) chosen in one of the two ways 
to make r i-> Zj{r) continuous (j — 1,2). We also define zi(ri) = Z2{ri) ~ wi and 

2^1 (''2) = Z2{r2) = W2. 

It is easy to see that there exists a continuous function 5 : 7 — C such that g is 
constant in a neighbourhood of wi and a neighbourhood of W2 and 

max|/(z) - g{z)\ < ^. 
2:67 2 

We then define the functions ai, 02 : [ri, r2] — > C by 

,9iz2{r)) - .g(zi(r)) 



ai{r) = .g(^i(r)) - Zi{r 
a2(r) = 



Z2{r) - zi{r) 

.g(z2(r)) - 5(2:1 (r)) 



(4.7) 



Z2ir) - zi{r) 

The functions ai and a2 are continuous since we chose g to be constant near wi and 
'W2- Note that g{z) — ai{\z\)+a2{\z\)z for all z G 7. By the Weierstrass approximation 
theorem for compact intervals on the real line, there exists ordinary polynomials pi 
and p2 such that 

max \aj{r) -Pjir^)] < . (j = 1,2). 

ri<r<r2 4(^1+ 7^2 j 

We then define p{z) = pi(|zp) + p2(|^P)-2 and see that p G V{r2). Also 
\g{z)-p{z)\ < K(|z|) -pi(|z|2)| + |a2(|z|) ~P2{\z\')\\z\ < I 

for all z G 7. The estimate (|4.6p then readily follows. □ 

It is noteworthy that compact subsets of M are admissible. This is the case in the 
Hermitian Lanczos method. 

The exponential function is the most important example of a nonrational (cer- 
tainly continuous) function. In the present context we obtain it as a limit of elements 
in V{r2) as follows. 

Example 1. Consider a condiagonalizable Af# G C"^" as in (|2.2p . The 
corresponding semigroup is defined as 



i=o 

for t G M. (Then e**^*'^a;o solves the initial value problem x' — M^x, x{0) = xq.) 
When applied to a vector 6 G C" such that D~^X~^b = r G M", we obtain the 
associated exponential function 

°° 1 \ 

when looking at the problem in the corresponding basis. Of course, this reduces to 
the standard exponential function for A G M. 
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5. Numerical experiments. We now present very preliminary numerical ex- 
periments on the relationship between Vj{2r) and the convergence of the M-linear 
GMRES method. For simplicity, we focus specifically on the CSYM method. We 
do not have solutions to the polynomial minimization problems of Theorem 12.51 and 
Corollary 12.61 However, numerical results on the respective diagonal linear systems 
with a real right-hand side unveil some of the intricacies of the latter problem. 

In all the examples given below the CSYM method is thus executed to solve 

D#x = r, 

where € C"^" is a diagonal matrix and r G M" is such that all its entries are ones. 
The diagonal entries of are set as 

d,, (5.1) 

where i?i = 1 and i?„ = 10 while the other values of Rj are linearly interpolated 
between these two extremes. The angles (pj are specificied in each case separately and 
described below. For each example we plot the diagonal of and the logj^Q of the 
relative residual \\r — D^j^\\/\\r\\, where Xj G ICj-i{D^T;r) is the minimizing vector 
with the starting vector xo = 0. We used n = 500 in each problem. 

Before describing the examples, we want to mention a feature which we find puz- 
zling. Namely, the numerical results depend on how accurately the entries (|5.1|) are 
generated. This is illustrated in the first example below. All the computations were 
carried out in MATLA^l variable-precision arithmetic with an accuracy of 30 decimals 
(recall that double-precision floating point numbers have approximately 16 decimals). 
The high-precision arithmetic was chosen in order to show that the apparent numeri- 
cal instability of double-precision floating point computations seems to result from the 
input itself rather than any serious cancellation effect in the minimal residual al- 
gorithm. There is no qualitative change to the results by using more than 30 decimals 
of accuracy. At the moment we do not have an explanation for this behaviour. 

The actual examples are set up as follows. In the first example we illustrate the 
comment made after Corollarv 12.61 i.e., when a{D^) is on a line through the origin, 
the CSYM method reduces to the MINRES method. Then in the examples that 
follow, the line is deformed into more complicated shapes. The rate of convergence 
slows down accordingly. 

• Example 1. Here we chose (pj = 1/10 for all j; see the left panel of Figure 
15.11 for a{D^). The matrix was first computed in high-precision and in 
this case the residual dropped in a straight line. The matrix was then 
converted to double-precision format and back to high-precision format. The 
computation was performed again giving the slower convergence starting at 
approximately 10~^. Similar effect would be seen in Example 2 as well if the 
precision of the input was lowered. 

• Example 2. Here we computed using five different sets of angles. We chose 
(j)^^^ = 0, (jfif^^ = N, where iV = 1, . . . , 5, and the rest of (j)^^^ were linearly 
interpolated between the extremes. See Figure 15.21 ^ 

• Example 3. Let </>! = 0, 0„ = 1 and the rest of (pj linearly interpolated 
between the extremes (Example 2 case N — 1). Additionally, a random 
vector p G C" was generated with entries uniformly distributed between 
and 10^^". Given an integer k such that 1 < A: < n, we chose four different 
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Fig. 5.1. Example 1 




Real axis Number of iterations 

Fig. 5.2. Example 2. The diagonal of in the case N = 2 is plotted on the left panel. 

sets of angles in (|5.1|) by (pj — 4'j + Pj for j ^ k and 0^ = for j > k. 
The results are displayed in Figure 15.31 
• Example 4. Let <j>j and p be as in Example 3. Given an integer K such that 

1 < K < n, we now chose four different sets of angles in (|5.1|) by (f)^^^^ = 
(k) — 

for j < n — K and — (f>j + pj for j > n — K . The results are displayed in 
Figure 15.41 
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Fig. 5.3. Example 3 
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Fig. 5.4. Example 4 

• Example 5. We chose two different sets of angles. For the first set, (j)^^'^ was 
chosen uniformly distributed between and 1. For the second set, we chose 

(2) (2) " (2) 

0i = O,0„ = 1 and for every odd j the angle is linearly interpolated 

(2) 

between the extremes. For even j we linearly interpolated cpj between and 
2. The results in Figure [^751 show that the residual makes almost no progress 
at every other iteration step. The residual for the second set of angles closely 
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Fig. 5.5. Example 5. Only the diagonal of in the two spirals case is plotted on the left panel. 



follows, but makes more even progress at all iteration steps. 
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